Many popular linear classifiers, such as logistic regression, boosting, orSVM, are trained by optimizing a margin-based risk function. Traditionally,these risk functions are computed based on a labeled dataset. We develop anovel technique for estimating such risks using only unlabeled data and themarginal label distribution. We prove that the proposed risk estimator isconsistent on high-dimensional datasets and demonstrate it on synthetic andreal-world data. In particular, we show how the estimate is used for evaluatingclassifiers in transfer learning, and for training classifiers with no labeleddata whatsoever.
展开▼